AITopics | ai safety institute

Collaborating Authors

ai safety institute

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Korbak, Tomek, Balesni, Mikita, Barnes, Elizabeth, Bengio, Yoshua, Benton, Joe, Bloom, Joseph, Chen, Mark, Cooney, Alan, Dafoe, Allan, Dragan, Anca, Emmons, Scott, Evans, Owain, Farhi, David, Greenblatt, Ryan, Hendrycks, Dan, Hobbhahn, Marius, Hubinger, Evan, Irving, Geoffrey, Jenner, Erik, Kokotajlo, Daniel, Krakovna, Victoria, Legg, Shane, Lindner, David, Luan, David, Mądry, Aleksander, Michael, Julian, Nanda, Neel, Orr, Dave, Pachocki, Jakub, Perez, Ethan, Phuong, Mary, Roger, Fabien, Saxe, Joshua, Shlegeris, Buck, Soto, Martín, Steinberger, Eric, Wang, Jasmine, Zaremba, Wojciech, Baker, Bowen, Shah, Rohin, Mikulik, Vlad

arXiv.org Machine LearningJul-16-2025

AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2507.11473

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Turing to Tomorrow: The UK's Approach to AI Regulation

Ritchie, Oliver, Anderljung, Markus, Rachman, Tom

arXiv.org Artificial IntelligenceJul-8-2025

The UK has pursued a distinctive path in AI regulation: less cautious than the EU but more willing to address risks than the US, and has emerged as a global leader in coordinating AI safety efforts. Impressive developments from companies like London-based DeepMind began to spark concerns in the UK about catastrophic risks from around 2012, although regulatory discussion at the time focussed on bias and discrimination. By 2022, these discussions had evolved into a "pro-innovation" strategy, in which the government directed existing regulators to take a light-touch approach, governing AI at point of use, but avoided regulating the technology or infrastructure directly. ChatGPT arrived in late 2022, galvanising concerns that this approach may be insufficient. The UK responded by establishing an AI Safety Institute to monitor risks and hosting the first international AI Safety Summit in 2023, but - unlike the EU - refrained from regulating frontier AI development in addition to its use. A new government was elected in 2024 which promised to address this gap, but at the time of writing is yet to do so. What should the UK do next? The government faces competing objectives: harnessing AI for economic growth and better public services while mitigating risk. In light of these, we propose establishing a flexible, principles-based regulator to oversee the most advanced AI development, defensive measures against risks from AI-enabled biological design tools, and argue that more technical work is needed to understand how to respond to AI-generated misinformation. We argue for updated legal frameworks on copyright, discrimination, and AI agents, and that regulators will have a limited but important role if AI substantially disrupts labour markets. If the UK gets AI regulation right, it could demonstrate how democratic societies can harness AI's benefits while managing its risks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.0305

Country:

North America > United States > California (0.28)
Europe > France (0.14)
North America > Canada > Ontario > Toronto (0.14)
(11 more...)

Genre: Research Report > Experimental Study (0.45)

Industry:

Law > Statutes (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Under Trump, AI Scientists Are Told to Remove 'Ideological Bias' From Powerful Models

WIREDMar-14-2025, 23:29:46 GMT

The National Institute of Standards and Technology (NIST) has issued new instructions to scientists that partner with the US Artificial Intelligence Safety Institute (AISI) that eliminate mention of "AI safety," "responsible AI," and "AI fairness" in the skills it expects of members and introduces a request to prioritize "reducing ideological bias, to enable human flourishing and economic competitiveness." The information comes as part of an updated cooperative research and development agreement for AI Safety Institute consortium members, sent in early March. Previously, that agreement encouraged researchers to contribute technical work that could help identify and fix discriminatory model behavior related to gender, race, age, or wealth inequality. Such biases are hugely important because they can directly affect end users and disproportionately harm minorities and economically disadvantaged groups. The new agreement removes mention of developing tools "for authenticating content and tracking its provenance" as well as "labeling synthetic content," signaling less interest in tracking misinformation and deep fakes.

artificial intelligence, institute, natural language, (14 more...)

WIRED

Country: North America > United States (0.74)

Industry: Government > Regional Government > North America Government > United States Government (0.74)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.42)

Add feedback

WebGames: Challenging General-Purpose Web-Browsing AI Agents

Thomas, George, Chan, Alex J., Kang, Jikun, Wu, Wenqi, Christianos, Filippos, Greenlee, Fraser, Toulis, Andy, Purtorab, Marvin

arXiv.org Artificial IntelligenceFeb-25-2025

We introduce WebGames, a comprehensive benchmark suite designed to evaluate general-purpose web-browsing AI agents through a collection of 50+ interactive challenges. These challenges are specifically crafted to be straightforward for humans while systematically testing the limitations of current AI systems across fundamental browser interactions, advanced input processing, cognitive tasks, workflow automation, and interactive entertainment. Our framework eliminates external dependencies through a hermetic testing environment, ensuring reproducible evaluation with verifiable ground-truth solutions. We evaluate leading vision-language models including GPT-4o, Claude Computer-Use, Gemini-1.5-Pro, and Qwen2-VL against human performance. Results reveal a substantial capability gap, with the best AI system achieving only 43.1% success rate compared to human performance of 95.7%, highlighting fundamental limitations in current AI systems' ability to handle common web interaction patterns that humans find intuitive. The benchmark is publicly available at webgames.convergence.ai, offering a lightweight, client-side implementation that facilitates rapid evaluation cycles. Through its modular architecture and standardized challenge specifications, WebGames provides a robust foundation for measuring progress in development of more capable web-browsing agents.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.18356

Country:

Europe > United Kingdom (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The National Institute of Standards and Technology Braces for Mass Firings

WIREDFeb-20-2025, 20:19:27 GMT

Sweeping layoffs architected by the Trump administration and the so-called Department of Government Efficiency may be coming as soon as this week at the National Institute of Standards and Technology (NIST), a non-regulatory agency responsible for establishing benchmarks that ensure everything from beauty products to quantum computers are safe and reliable. According to several current and former employees at NIST, the agency has been bracing for cuts since President Donald Trump took office last month and ordered billionaire Elon Musk and DOGE to slash spending across the federal government. The fears were heightened last week when some NIST workers witnessed a handful of people they believed to be associated with DOGE inside Building 225, which houses the NIST Information Technology Laboratory at the agency's Gaithersburg, Maryland campus, according to multiple people briefed on the sightings. The DOGE staff were seeking access to NIST's IT systems, one of the people said. Soon after the purported visit, NIST leadership told employees that DOGE staffers were not currently on campus, but that office space and technology were being provisioned for them, according to the same people.

artificial intelligence, institute, national institute, (11 more...)

WIRED

Country: North America > United States > Maryland > Montgomery County > Gaithersburg (0.26)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Using tournaments to calculate AUROC for zero-shot classification with LLMs

Yoon, Wonjin, Bulovic, Ian, Miller, Timothy A.

arXiv.org Artificial IntelligenceFeb-20-2025

Large language models perform surprisingly well on many zero-shot classification tasks, but are difficult to fairly compare to supervised classifiers due to the lack of a modifiable decision boundary. In this work, we propose and evaluate a method that converts binary classification tasks into pairwise comparison tasks, obtaining relative rankings from LLMs. Repeated pairwise comparisons can be used to score instances using the Elo rating system (used in chess and other competitions), inducing a confidence ordering over instances in a dataset. We evaluate scheduling algorithms for their ability to minimize comparisons, and show that our proposed algorithm leads to improved classification performance, while also providing more information than traditional zero-shot classification.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.15018

Country: North America > United States > California > Los Angeles County > Pasadena (0.04)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Chess (0.58)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Add feedback

Which Information should the UK and US AISI share with an International Network of AISIs? Opportunities, Risks, and a Tentative Proposal

Thurnherr, Lara

arXiv.org Artificial IntelligenceFeb-5-2025

The UK AI Safety Institute (UK AISI) and its parallel organisation in the United States (US AISI) take up a unique position in the recently established International Network of AISIs. Both are in jurisdictions with frontier AI companies and are assuming leading roles in the international conversation on AI Safety. This paper argues that it is in the interest of both institutions to share specific categories of information with the International Network of AISIs, deliberately abstain from sharing others and carefully evaluate sharing some categories on a case by case basis, according to domestic priorities. The paper further proposes a provisional framework with which policymakers and researchers can distinguish between these three cases, taking into account the potential benefits and risks of sharing specific categories of information, ranging from pre-deployment evaluation results to evaluation standards. In an effort to further improve the research on AI policy relevant information sharing decisions, the paper emphasises the importance of continuously monitoring fluctuating factors influencing sharing decisions and a more in-depth analysis of specific policy relevant information categories and additional factors to consider in future research.

aisis, information, international network, (11 more...)

arXiv.org Artificial Intelligence

2503.04741

Country:

North America > United States (1.00)
Europe > United Kingdom (0.15)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Collaboration (0.36)

Add feedback

Enabling External Scrutiny of AI Systems with Privacy-Enhancing Technologies

Beers, Kendrea, Toner, Helen

arXiv.org Artificial IntelligenceFeb-5-2025

This article describes how technical infrastructure developed by the nonprofit OpenMined enables external scrutiny of AI systems without compromising sensitive information. Independent external scrutiny of AI systems provides crucial transparency into AI development, so it should be an integral component of any approach to AI governance. In practice, external researchers have struggled to gain access to AI systems because of AI companies' legitimate concerns about security, privacy, and intellectual property. But now, privacy-enhancing technologies (PETs) have reached a new level of maturity: end-to-end technical infrastructure developed by OpenMined combines several PETs into various setups that enable privacy-preserving audits of AI systems. We showcase two case studies where this infrastructure has been deployed in real-world governance scenarios: "Understanding Social Media Recommendation Algorithms with the Christchurch Call" and "Evaluating Frontier Models with the UK AI Safety Institute." We describe types of scrutiny of AI systems that could be facilitated by current setups and OpenMined's proposed future setups. We conclude that these innovative approaches deserve further exploration and support from the AI governance community. Interested policymakers can focus on empowering researchers on a legal level.

artificial intelligence, openmined, social media, (19 more...)

arXiv.org Artificial Intelligence

2502.05219

Country:

Europe > United Kingdom (0.05)
Oceania > New Zealand (0.04)
North America > United States > Delaware (0.04)
(2 more...)

Genre: Research Report (0.86)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.35)

Add feedback

British AI startup with government ties is developing tech for military drones

The GuardianJan-7-2025, 06:00:04 GMT

A company that has worked closely with the UK government on artificial intelligence safety, the NHS and education is also developing AI for military drones. The consultancy Faculty AI has "experience developing and deploying AI models on to UAVs", or unmanned aerial vehicles, according to a defence industry partner company. Faculty has emerged as one of the most active companies selling AI services in the UK. Unlike the likes of OpenAI, Deepmind or Anthropic, it does not develop models itself, instead focusing on reselling models, notably from OpenAI, and consulting on their use in government and industry. Faculty gained particular prominence in the UK after working on data analysis for the Vote Leave campaign before the Brexit vote.

aisi, faculty, government, (12 more...)

The Guardian

Country: Europe > United Kingdom (1.00)

Industry: